<!DOCTYPE html>

<html>

  <head>
    <title>Underactuated Robotics: Optimization and
  Mathematical Programming</title>
    <meta name="Underactuated Robotics: Optimization and
  Mathematical Programming" content="text/html; charset=utf-8;" />
    <link rel="canonical" href="http://underactuated.mit.edu/optimization.html" />

    <script src="https://hypothes.is/embed.js" async></script>
    <script type="text/javascript" src="htmlbook/book.js"></script>

    <script src="htmlbook/mathjax-config.js" defer></script> 
    <script type="text/javascript" id="MathJax-script" defer
      src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
    </script>
    <script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>

    <link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
    <script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
    <script>hljs.initHighlightingOnLoad();</script>

    <link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
  </head>

<body onload="loadChapter('underactuated');">

<div data-type="titlepage">
  <header>
    <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
    <p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p> 
    <p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
    <p style="font-size: 14px; text-align: right;"> 
      &copy; Russ Tedrake, 2020<br/>
      Last modified <span id="last_modified"></span>.</br>
      <script>
      var d = new Date(document.lastModified);
      document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
      <a href="tocite.html">How to cite these notes</a> &nbsp; | &nbsp;
      <a target="_blank" href="https://docs.google.com/forms/d/e/1FAIpQLSesAhROfLRfexrRFebHWLtRpjhqtb8k_iEagWMkvc7xau08iQ/viewform?usp=sf_link">Send me your feedback</a><br/>
    </p>
  </header>
</div>

<p><b>Note:</b> These are working notes used for <a
href="http://underactuated.csail.mit.edu/Spring2020/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2020 semester.  <a 
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture  videos are available on YouTube</a>.</p> 

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=multibody.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=playbook.html>Next Chapter</a></td>
</tr></table>


<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter class="appendix" style="counter-reset: chapter 2"><h1>Optimization and
  Mathematical Programming</h1>

  <p>The view of dynamics and controls taken in these notes builds heavily on
  tools from optimization -- and our success in practice depends heavily on the
  effective application of numerical optimization.  There are many excellent
  books on optimization, for example <elib>Nocedal06</elib> is an excellent
  reference on smooth optimization and <elib>Boyd04a</elib> is an excellent
  reference on convex optimization (which we use extensively).  I will provide
  more references for the specific optimization formulations below.</p>
  
  <p>The intention of this chapter is, therefore, mainly to provide a launchpad
  and to address some topics that are particularly relevant in robotics but
  might be more hidden in the existing general treatments.  You can get far very
  quickly as a consumer of these tools with just a high-level understanding.
  But, as with many things, sometimes the details matter and it becomes
  important to understand what is going on under the hood.  We can often
  formulate an optimization problem in multiple ways that might be
  mathematically equivalent, but perform very differently in practice.</p>

  <section><h1>Optimization software</h1>

    <p>Some of the algorithms from optimization are quite simple to implement
    yourself; stochastic gradient descent is perhaps the classic example.  Even
    more of them are conceptually fairly simple, but for some of the algorithms,
    the implementation details matter a great deal -- the difference between an
    expert implementation and a novice implementation of the numerical recipe,
    in terms of speed and/or robustness, can be dramatic.  These packages often
    use a wealth of techniques for numerically conditioning the problems, for
    discarding trivially valid constraints, and for warm-starting optimization
    between solves.  Not all solvers support all types of objectives and
    constraints, and even if we have two commercial solvers that both claim to
    solve, e.g. quadratic programs, then they might perform differently on the
    particular structure/conditioning of your problem.  In some cases, we also
    have nicely designed open-source alternatives to these commercial solvers
    (often written by academics who are experts in optimization) -- sometimes
    they compete well with the commercial solvers or even outperform them in
    particular problem types.</p>

    <p>As a result, there are also a handful of software packages that attempt
    to provide a layer of abstraction between the formulation of a mathematical
    program and the instantiation of that problem in each of the particular
    solvers.  Famous examples of this include <a
    href="http://cvxr.com/cvx/">CVX</a> and <a
    href="https://yalmip.github.io/">YALMIP</a> for MATLAB, and <a
    href="https://www.juliaopt.org/JuMP.jl/stable/">JuMP</a> for Julia.
    <drake></drake>'s MathematicalProgram classes provide a middle layer like
    this for C++ and Python; its creation was motivated initially and
    specifically by the need to support the optimization formulations we use in
    this text.</p>

    <p>We have a number of tutorials on mathematical programming in
    <drake></drake>, starting with a general introduction <a
    href="https://mybinder.org/v2/gh/RobotLocomotion/drake/nightly-release?filepath=tutorials/mathematical_program.ipynb">here</a>.  Drake <a href="https://drake.mit.edu/doxygen_cxx/group__solvers.html">supports a number of custom, open-source, and commercial solvers</a> (and even some of the commercial solvers are free to academic users).</p>

  </section>

  <section><h1>General concepts</h1>

    <p>The general formulation is ... </p>

    <p>It's important to realize that even though this formulation is incredibly
    general, it does have it's limits.  As just one example, when write
    optimizations to plan trajectories of a robot, in this formulation we
    typically have to choose a-priori as particular number of decision variables
    that will encode the solution.  Although we can of course write algorithms
    that change the number of variables and call the optimizer again, somehow I
    feel that this "general" formulation fails to capture, for instance, the
    type of mathematical programming that occurs in sample-based optimization
    planning -- where the resulting solutions can be described by any finite
    number of parameters, and the computation flexibly transitions amongst
    them.</p>

    <subsection><h1>Convex vs nonconvex optimization</h1>
    
      <p>Local optima.  Convex functions, convex constraints.</p>

      <p>If an optimization problem is nonconvex, it does not necessarily mean
      that the optimization is hard. There are many cases in deep learning where
      we can reliably solve seemingly very high-dimensional and nonconvex
      optimization problems.  Our understanding of these phenomenon is evolving
      rapidly, and I suspect anything I write here will shortly become outdated.
      But the current thinking in supervised learning mostly revolves around the
      idea that over-parameterization is key -- that many of these success
      stories are happening in a regime where we actually have more decision
      variables than we have data, and that the search space is actually dense
      with solutions that can fit the data perfectly (the so-called
      "interpolating solutions"), that not all global minima are equally robust,
      and that the optimization algorithms are performing some form of implicit
      regularization in order to pick a "good" interpolating solution.</p>
    
    </subsection>

    <subsection><h1>Constrained optimization with Lagrange multipliers</h1>
  
      <p>Given the equality-constrained optimization problem $$\minimize_\bz
      \ell(\bz) \quad \subjto \quad \bphi(\bz) = 0,$$ where $\bphi$ is a vector.
      Define a vector $\lambda$ of Lagrange multipliers, the same size as
      $\phi$, and the scalar Lagrangian function, $$L(\bz,{\bf \lambda}) =
      \ell(\bz) + \lambda^T\phi(\bz).$$ A necessary condition for $\bz^*$ to be
      an optimal value of the constrained optimization is that the gradients of
      $L$ vanish with respect to both $\bz$ and $\lambda$: $$\pd{L}{\bz} = 0,
      \quad \pd{L}{\lambda} = 0.$$  Note that $\pd{L}{\lambda} = \phi(\bz)$, so
      requiring this to be zero is equivalent to requiring the constraints to be
      satisfied.</p>

      <example><h1>Optimization on the unit circle</h1>

        <p>Consider the following optimization: $$\min_{x,y} x+y, \quad \subjto
        \quad x^2 + y^2 = 1.$$ The level sets of $x+y$ are straight lines with
        slope $-1$, and the constraint requires that the solution lives on the
        unit circle.</p> <figure> <img width="80%"
        src="figures/lagrange_multipliers_unit_circle.svg"/> </figure>
        <p>Simply by inspection, we can determine that the optimal solution
        should be $x=y=-\frac{\sqrt{2}}{2}.$ Let's make sure we can obtain the
        same result using Lagrange multipliers. </p>

        <p>Formulating $$L = x+y+\lambda(x^2+y^2-1),$$ we can take the
        derivatives and solve

        \begin{align*}
        \pd{L}{x} = 1 + 2\lambda x = 0 \quad & \Rightarrow & \lambda = -\frac{1}{2x}, \\
        \pd{L}{y} = 1 + 2\lambda y = 1 - \frac{y}{x} = 0 \quad & \Rightarrow & y = x,\\
        \pd{L}{\lambda} = x^2+y^2-1 = 2x^2 -1 = 0 \quad & \Rightarrow & x = \pm \frac{1}{\sqrt{2}}.
        \end{align*}

        Given the two solutions which satisfy the necessary conditions, the
        negative solution is clearly the minimizer of the objective.</p>

      </example>

    </subsection>

  </section>

  <section><h1>Convex optimization</h1>
    <subsection><h1>Linear Programs/Quadratic Programs/Second-Order Cones</h1>
      Example: Balancing force control on Atlas
    </subsection>
    <subsection><h1>Semidefinite Programming and Linear Matrix Inequalities</h1>
    </subsection>

    <subsection id=sums_of_squares><h1>Sums-of-squares optimization</h1>

      <p>It turns out that in the same way that we can use SDP to search over
      the positive quadratic equations, we can generalize this to search over
      the positive polynomial equations.  To be clear, for quadratic equations
      we have \[ {\bf P}\succeq 0 \quad \Rightarrow \quad \bx^T {\bf P} \bx \ge
      0, \quad \forall \bx. \]  It turns out that we can generalize this to \[
      {\bf P}\succeq 0 \quad \Rightarrow \quad {\bf m}^T(\bx) {\bf P} {\bf
      m}(\bx) \ge 0, \quad \forall \bx, \] where ${\bf m}(\bx)$ is a vector of
      polynomial equations, typically chosen as a vector of <em>monomials</em>
      (polynomials with only one term).  The set of positive polynomials
      parameterized in this way is exactly the set of polynomials that can be
      written as a <em>sum of squares</em><elib>Parrilo00</elib>.  While it is
      known that not all positive polynomials can be written in this form, much
      is known about the gap.  For our purposes this gap is very small (papers
      have been written about trying to find polynomials which are uniformly
      positive but not sums of squares); we should remember that it exists but
      not worry about it too much for now.</p>

      <p>Even better, there is quite a bit that is known about how to choose
      the terms in ${\bf m}(\bx)$.  For example, if you give me the polynomial
      \[ p(x) = 2 - 4x + 5x^2 \] and ask if it is positive for all real $x$, I
      can convince you that it is by producing the sums-of-squares
      factorization \[ p(x) = 1 + (1-2x)^2 + x^2, \] and I know that I can
      formulate the problem without needing any monomials of degree greater
      than 1 (the square-root of the degree of $p$) in my monomial vector.  In
      practice, what this means for us is that people have authored
      optimization front-ends which take a high-level specification with
      constraints on the positivity of polynomials and they automatically
      generate the SDP problem for you, without having to think about what
      terms should appear in ${\bf m}(\bx)$ (c.f. <elib>Prajna04</elib>).
      This class of optimization problems is called Sums-of-Squares (SOS)
      optimization.</p>

      <p>As it happens, the particular choice of ${\bf m}(\bx)$ can have a huge
      impact on the numerics of the resulting semidefinite program (and on your
      ability to solve it with commercial solvers). <drake></drake> implements
      some particularly novel/advanced algorithms to make this work well
      <elib>Permenter17</elib>.</p>

      <p>We will write optimizations using sums-of-squares constraints as \[
      p(\bx) \sos \] as shorthand for the constraint that $p(\bx) \ge 0$ for all
      $\bx$, as demonstrated by finding a sums-of-squares decomposition.</p>

      <p>This is surprisingly powerful.  It allows us to use convex
      optimization to solve what appear to be very non-convex optimization
      problems.</p>

      <example><h1>Global minimization via SOS</h1>

        <p>Consider the following famous non-linear function of two variables
        (called the "six-hump-camel": $$p(x) = 4x^2 + xy - 4y^2 - 2.1x^4 + 4y^4
        + \frac{1}{3}x^6.$$  This function has six local minima, two of them
        being global minima <elib>Henrion09a</elib>.</p>

        <div  style="text-align:center"><img
                src="figures/six_hump_camel.svg" width="90%"/></div>

        <p>By formulating a simple sums-of-squares optimization, we can
        actually find the minimum value of this function (technically, it is
        only a lower bound, but in this case and many cases, it is surprisingly
        tight) by writing: \begin{align*} \max_\lambda \ \ & \lambda \\
        \text{s.t. } & p(x) - \lambda \text{ is sos.} \end{align*}  Go ahead
        and play with the code (most of the lines are only for plotting; the
        actual optimization problem is nice and simple to formulate). </p>

        <jupyter>examples/optimization.ipynb</jupyter>

        <p>Note that this finds the minimum value, but does not actually
        produce the $\bx$ value which mimimizes it.  This is possible
        <elib>Henrion09a</elib>, but it requires examining the dual of the
        sums-of-squares solutions (which we don't need for the goals of this
        chapter).</p>
      </example>

      <subsubsection><h1>Sums of squares on a Semi-Algebraic Set</h1>
      
        <p>The S-procedure.</p>
      
      </subsubsection>

      <subsubsection><h1>Sums of squares optimization on an Algebraic Variety</h1>
      
        <p>The S-procedure</p>

        <p>Using the quotient ring</p>

        <p>Quotient rings via sampling</p>

      </subsubsection>

      <subsubsection><h1>DSOS and SDSOS</h1>
      
      </subsubsection>

    </subsection>
    
    <subsection><h1>Solution techniques</h1>
      Interior point (Gurobi, Mosek, Sedumi, ...), First order methods
    </subsection>
  </section>

  <section id="nonlinear"><h1>Nonlinear programming</h1>

    <p>The generic formulation of a nonlinear optimization problem is     \[
    \min_z c(z) \quad \subjto \quad \bphi(z) \le 0, \]     where $z$ is a
    vector of <em>decision variables</em>, $c$ is a scalar <em>objective
    function</em> and $\phi$ is a vector     of <em>constraints</em>.  Note
    that, although we write $\phi \le 0$, this formulation     captures
    positivity constraints on the decision variables (simply multiply the
    constraint by $-1$) and equality constraints (simply list both $\phi\le0$
    and $-\phi\le0$) as well. </p>

    <p>The picture that you should have in your head is a nonlinear, potentially
    non-convex objective function defined over (multi-dimensional) $z$, with a
    subset of possible $z$ values satisfying the constraints.</p>

    <figure> <img width="80%"
    src="figures/nonlinear_optimization_w_minima.svg"/>
    <figcaption>One-dimensional cartoon of a nonlinear optimization problem. The
    red dots represent local minima.  The blue dot represents the optimal
    solution.</figcaption> </figure>

    <p>Note that minima can be the result of the objective function having zero
    derivative <em>or</em> due to a sloped objective up against a
    constraint.</p>

    <p>Numerical methods for solving these optimization problems require an
    initial guess, $\hat{z}$, and proceed by trying to move down the objective
    function to a minima.  Common approaches include <em>gradient descent</em>,
    in which the gradient of the objective function is computed or estimated, or
    second-order methods such as <em>sequential quadratic programming (SQP)</em>
    which attempts to make a local quadratic approximation of the objective
    function and local linear approximations of the constraints and solves a
    quadratic program on each iteration to jump directly to the minimum of the
    local approximation.</p>

    <p>While not strictly required, these algorithms can often benefit a great
    deal from having the gradients of the objective and constraints computed
    explicitly; the alternative is to obtain them from numerical
    differentiation.  Beyond pure speed considerations, I strongly prefer to
    compute the gradients explicitly because it can help avoid numerical
    accuracy issues that can creep in with finite difference methods.  The
    desire to calculate these gradients will be a major theme in the discussion
    below, and we have gone to great lengths to provide explicit gradients of
    our provided functions and automatic differentiation of user-provided
    functions in <drake></drake>.</p>

    <p>When I started out, I was of the opinion that there is nothing difficult
    about implementing gradient descent or even a second-order method, and I
    wrote all of the solvers myself.  I now realize that I was wrong.  The
    commercial solvers available for nonlinear programming are substantially
    higher performance than anything I wrote myself, with a number of tricks,
    subtleties, and parameter choices that can make a huge difference in
    practice.  Some of these solvers can exploit sparsity in the problem (e.g.,
    if the constraints operate in a sparse way on the decision variables).
    Nowadays, we make heaviest use of SNOPT <elib>Gill06</elib>, which now comes
    bundled with the binary distributions of <drake></drake>, but also <a
    href="http://drake.mit.edu/doxygen_cxx/group__solvers.html">support a large
    suite of numerical solvers</a>.  Note that while I do advocate using these
    tools, you do not need to use them as a black box.  In many cases you can
    improve the optimization performance by understanding and selecting
    non-default configuration parameters.</p>

    <subsection><h1>Second-order methods (SQP /
    Interior-Point)</h1></subsection>

    <subsection><h1>First-order methods (SGD / ADMM) </h1>
    
      <subsubsection id="penalty"><h1>Penalty methods</h1>
      
        <p>Augmented Lagrangian</p>
      
      </subsubsection>
      <subsubsection><h1>Projected Gradient Descent</h1></subsubsection>

    </subsection>

    <subsection><h1>Zero-order methods (CMA)</h1></subsection>

    <todo>My original list: Lagrange multipliers / KKT, Gradient descent, SQP (SNOPT, NLOPT, IPOPT), Global optimization</todo>

    <subsection><h1>Example: Inverse Kinematics</h1>
    </subsection>


  </section> <!-- nonlinear optimization -->

  <section><h1>Combinatorial optimization</h1>
    <subsection><h1>Search, SAT, First order logic, SMT solvers, LP interpretation</h1></subsection>
    <subsection><h1>Mixed-integer convex optimization</h1>
    
      <p>An advanced, but very readable book on MIP <elib>Conforti14</elib>.  Nice survey paper on MILP <elib>Vielma15</elib>.</p>
    
    </subsection>
  </section>
  <section><h1>"Black-box" optimization</h1>
    Derivative-free methods.  Some allow noisy evaluations. 
  </section>

  <todo>Polynomial root finding/homotopy, L1 optimization, QCQP, LCP/Variational inequalities, BMI, ..., warm-starting, ...</todo>


</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->

<div id="references"><section><h1>References</h1>

<table>

<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Nocedal06">1</a>]
</td>
<td class="bibtexitem">
Jorge Nocedal and Stephen&nbsp;J. Wright.
 <em>Numerical optimization</em>.
 Springer series in operations research. Springer, New York, 2nd ed
  edition, 2006.
 OCLC: ocm68629100.

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Boyd04a">2</a>]
</td>
<td class="bibtexitem">
Stephen Boyd and Lieven Vandenberghe.
 <em>Convex Optimization</em>.
 Cambridge University Press, 2004.

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Parrilo00">3</a>]
</td>
<td class="bibtexitem">
Pablo&nbsp;A. Parrilo.
 <em>Structured Semidefinite Programs and Semialgebraic Geometry
  Methods in Robustness and Optimization</em>.
 PhD thesis, California Institute of Technology, May 18 2000.

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Prajna04">4</a>]
</td>
<td class="bibtexitem">
Stephen Prajna, Antonis Papachristodoulou, Peter Seiler, and Pablo&nbsp;A. Parrilo.
 <em>SOSTOOLS: Sum of Squares Optimization Toolbox for MATLAB
  Userâ€™s guide</em>, 2.00 edition, June 1 2004.

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Permenter17">5</a>]
</td>
<td class="bibtexitem">
Frank&nbsp;Noble Permenter.
 <em>Reduction methods in semidefinite and conic optimization</em>.
 PhD thesis, Massachusetts Institute of Technology, 2017.
[&nbsp;<a href=" http://groups.csail.mit.edu/robotics-center/public_papers/Permenter17.pdf ">www:</a>&nbsp;]

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Henrion09a">6</a>]
</td>
<td class="bibtexitem">
Didier Henrion, Jean-Bernard Lasserre, and Johan Lofberg.
 Gloptipoly 3: moments, optimization and semidefinite programming.
 <em>Optimization Methods &amp; Software</em>, 24(4-5):761--779, 2009.

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Gill06">7</a>]
</td>
<td class="bibtexitem">
Philip&nbsp;E. Gill, Walter Murray, and Michael&nbsp;A. Saunders.
 <em>User's Guide for SNOPT Version 7: Software for Large -Scale
  Nonlinear Programming</em>, February 12 2006.

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Conforti14">8</a>]
</td>
<td class="bibtexitem">
Michele Conforti, G&eacute;rard Cornu&eacute;jols, Giacomo Zambelli, et&nbsp;al.
 <em>Integer programming</em>, volume 271.
 Springer, 2014.

</td>
</tr>


<tr valign="top">
<td align="right" class="bibtexnumber">
[<a id="Vielma15">9</a>]
</td>
<td class="bibtexitem">
Juan&nbsp;Pablo Vielma.
 Mixed integer linear programming formulation techniques.
 <em>Siam Review</em>, 57(1):3--57, 2015.

</td>
</tr>
</table>
</section><p/>
</div>

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=multibody.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=playbook.html>Next Chapter</a></td>
</tr></table>

<div id="footer">
  <hr>
  <table style="width:100%;">
    <tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">&copy; Russ
      Tedrake, 2020</td></tr>
  </table>
</div>


</body>
</html>
